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Conditional expectations given past observations in stationary 
time series are usually estimated directly by kernel estimators, or by 
plugging in kernel estimators for transition densities. We show that, 
for linear and nonlinear autoregressive models driven by independent 
innovations, appropriate smoothed and weighted von Mises statistics 
of residuals estimate conditional expectations at better parametric 
rates and are asymptotically efficient. The proof is based on a uniform 
stochastic expansion for smoothed and weighted von Mises processes 
of residuals. We consider, in particular, estimation of conditional dis- 
tribution functions and of conditional quantile functions. 

1. Introduction. Let Xq, . . . ,Xn be observations from a real- valued sta- 
tionary time series. Conditional expectations E{q[Xn-\-m)\Xn = x) with lag 
m of some known function q can be estimated by kernel estimators. For 
asymptotic results under various mixing conditions, we refer to [8, 23, 24, 
29, 36, 37, 43, 44]. If the time series is first-order Markov with transition 
density p{x,y), a conditional expectation of q with lag one can be written 
E{q{Xn+i)\Xn = x) = J q{y)p{x,y) dy and it can be estimated by plugging 
in a kernel estimator p{x,y). Asymptotic results for such estimators of con- 
ditional expectations are in [9, 16, 26, 27, 28]. 

If the Markov chain follows a nonparametric autoregressive model Xi = 
r{Xi-i) +ei, with unknown autoregression function r and independent and 
identically distributed (i.i.d.) mean zero innovations e^, then E{q{Xn+i)\Xn = 
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x) = E[q{ei +r(x))]. Let f denote a (kernel) estimator of the autoregression 
function. Write ii = Xi — r{Xi-i) for the residuals and F{y) = ^ X^iLi M^i ^ 
y] for the empirical distribution function based on them. The representation 
suggests estimating the conditional expectation by an empirical estimator 

1 " f 

(1.1) -J2q{£i+r{x))= q{y + r{x))dF{y). 

Tl . ^ J 

The convergence rate of (1.1) is given by the convergence rate of f. 

Suppose now that we have a linear or nonlinear parametric model r = r^ 
for the autoregression function. In this case we can use a n^/^-consistent 
estimator ^ for i? and the ?i^/^-consistent estimator r = r^ for r, and we 
can estimate the innovations Ei by ii = Xi — r^(Xj_i). Under appropriate 
smoothness and integrability conditions on the function q, one can prove 
by a Taylor expansion that the resulting estimator (1.1) is n^/^-consistent; 
see [33] for closely related details in a different problem. In particular, the es- 
timator (1.1) converges at a faster rate than the nonparametric estimators. If 
"!? is asymptotically normal, so is (1.1). Such results could also be obtained 
for heteroscedastic autoregressive models Xi = r^(Xj_i) + s^{Xi-i)ei, in- 
cluding ARCH models, and for GARCH models. For GARCH models and 
smooth q, one could use limit results for the empirical process of residuals 
obtained by Boldin [6, 7] and Berkes and Horvath [2, 3]. 

Since the innovations are assumed to have mean zero, the residual-based 
empirical distribution function F is not an efficient estimator of F. Thus, 
improvements over (1.1) are possible by replacing F by an efficient estimator. 
Here efficiency is meant in the sense of a semiparametric version of Hajek 
and Le Cam's convolution theorem; see also Section 6. An efficient estimator 
of F has been constructed in [32], but this estimator is not a distribution 
function. Alternative efficient estimators that are distribution functions are 
discussed in [18]. One such estimator is the weighted residual-based empirical 
distribution function 

1 " 

FUy) = -Y.^i^i^^^y^^ y^^^ 

n ^ 

1=1 

with an efficient estimator 'd and random weights Wi chosen following the 
empirical likelihood approach of Owen [19, 20] so that, with probability 
tending to one, F^ has mean zero, that is, / ydFw{y) = (1/n) Yl^=i '^i^i = 0- 
The resulting weighted version of (1.1) is the estimator 

r ~ I 

(1.2) / q{y + r^{x))dFu,{y) = -'^Wiq{ei+r^{x)). 

i=l 

This estimator is efficient if is. This is a consequence of the fact that 
smooth functionals of efficient estimators are efficient. An alternative to 
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weighting would be to subtract an appropriate "estimator of zero" from the 
estimator that corrects the influence function. See [15] and [12] for models 
with i.i.d. data; [17] for Markov chains; and [32, 33] for time series residu- 
als. However, weighting has the advantage that, with high probability, the 
information of mean zero is used exactly, so we expect better small-sample 
properties. 

Let us now look at some special cases in which simple alternative esti- 
mators are also available. For the conditional mean of lag one, for which 
q{x) = X, we have E{Xn+i\Xn = x) = r^{x). This can be estimated di- 
rectly by r^{x). This estimator is efficient if "d is. The estimator (1.1) is 

(l/n)X)r=i^j + r^(x), which is n^/^-consistent but is not efficient even if ^ 
is. The weighted estimator (1.2) equals the direct estimator r^{x) with 
probability tending to one. Hence, it is efficient if -d is. Another special 
case is the conditional second moment of lag one, for which q{x) = x^. 
We have E{X^^i\Xn = x) = E[si] + r'^{x). The empirical estimator (1.1) 
is {^/n)J2'^=i{£i + It is n^/^-consistent, but not efficient. A more di- 

rect n^/^-consistent estimator is the plug-in estimator (1/n) J27=i + '^|(^)• 
However, it is not efficient in general even if -d is, since it does not (fully) 
exploit the fact that the innovations have mean zero. Efficient estimators are 
given by the weighted empirical estimator and the (asymptotically equiva- 
lent) weighted plug- in estimator {l/n) ^27=1 Wiej + ?^|(2;), both with efficient 

{}. 

Similar results are possible for lag two. The conditional expectation 
E{q{XnJ^2)\Xn = x) hecom.Qs E[q{e2 + ri){£i + r^{x)))\ and can be estimated 
n^/^-consistently by the von Mises statistic 




<l{z + r^{y + r^{x)))dF{y)dF{z) = — ^ ^ q{ej + r^{ei + r^(x))) 



and the weighted von Mises statistic 

j j q{z + r^{y + r^{x)))dFMdF^{z) 

Y n n 

" i=ii=i 

The latter will be efficient if an efficient estimator of "i? is used. The von 
Mises statistics are easier to use than the usual kernel estimator because 
they do not require a choice of bandwidth. For certain q, simpler alterna- 
tive estimators are available. For example, the conditional mean of lag two 
equals E[r^{ei +r^(x))] and can be estimated more directly by the aver- 
age {l/n)J2i^^r^{ei + r^{x.)) or the weighted average {l/n)J2i^iWir^{ei + 
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(x) ) . The latter coincides with the weighted von Mises statistic with prob- 
ability tending to one. A degenerate case would be the linear AR(1) model, 
with r^(x) = i^x, for which the conditional mean of lag two is {P'x, which is 
estimated efficiently by v^x with efficient for The weighted von Mises 
statistic coincides with this simple efficient estimator with probability tend- 
ing to one. Simplified versions of the von Mises statistics are also available 
for estimating higher conditional moments of lag two. The conditional sec- 
ond moment of lag two simplifies to -E[e^] -|- i?[r|(ei -|- ^^(a;))] and can be 
estimated n^/^-consistently by the average (1/n) X)r=i(^? + + 
the weighted average {^/n)Y^^=lWi{e^ -|-r|(ei + r^{x))). The latter equals 
the weighted von Mises estimator with probability tending to one and is 
efficient if is. 

The above shows that conditional expectations of lags one and two can 
be estimated n^/^-consistently and efficiently for smooth q in nonlinear au- 
toregression models of order one. To prove n^/^-consistency of the estimator 
(1.1) for more general q, we need an appropriate balance of smoothness 
assumptions on q and on the innovation distribution. For discontinuous 
we must assume that the innovations have a smooth density /. One may 
then also want to replace F and F^] by smoothed versions Fg and Fgw, 
say, dFs{y) = f{y) dy and dFsu}{v) = fw{y) dy, where is a kernel estimator 
f{y) = {l/n)J27=i^bn{y — £i) of the density / and is a weighted kernel 
estimator /^(y) = (1/n) X;r=i ^i^bn (?/ " ^i)- Here h^{y) = k{y/bn)/hn for 
some kernel k and some bandwidth hn. These kernel estimators were stud- 
ied in [18]. Efficiency of the smoothed and weighted residual-based empirical 
distribution function Fgw was also shown there. The resulting smoothed and 
weighted von Mises statistic 

j j q{z + r^{y + r^{x)))fu,{y)dyfu,{z) dz 

preserves n^/^-consistency and efficiency even though the kernel estimators 
have a slower rate of convergence. Simulations show that smoothing im- 
proves the small-sample behavior of our estimator noticeably, especially if 
q is not smooth (see Table 1). This is a second-order effect. For theoretical 
results in this direction, see [11]. We note that the choice of bandwidth is less 
critical here than for the usual kernel estimators. In particular, the asymp- 
totic variance of our estimator does not depend on the choice of bandwidth 
in the allowed range. 

The smoothed and weighted estimator 

j j q{z + r^[y + r^{x)))fu,{y)dyf^{z)dz 

equals 

— 'Yl^WiWj j q{ej + bnU + r^{ei + bnV + r^{x)))k{u)duk{v)dv. 

i=lj=l 
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Table 1 

Simulated mean squared error for various von Mises estimators 





n 


u 


W 


1.50 


1.75 


2.00 


2.25 


2.50 


2.75 


Normal 


50 


6181 


967 


512 


462 


430 


414 


411 


417 




100 


3153 


460 


299 


279 


266 


261 


264 


273 




200 


1615 


227 


168 


160 


156 


155 


160 


168 


Logistic 


50 


6184 


1218 


647 


591 


558 


544 


545 


558 




100 


3204 


606 


390 


367 


356 


355 


364 


380 




200 


1620 


296 


220 


213 


212 


217 


227 


243 


t(5) 


50 


6363 


1513 


803 


738 


701 


686 


690 


706 




100 


3234 


756 


495 


470 


459 


461 


474 


495 




200 


1646 


375 


281 


274 


275 


283 


299 


320 



The table entries are lO'^xMSE of the von Mises estimator (U), the weighted von Mises es- 
timator (W) and the smoothed and weighted von Mises estimator for different bandwidths 
h„ = cn'^'"^ with c = 1.5, 1.75, 2, 2.25, 2.5, 2.75. The simulations are based on 20,000 repeti- 
tions. We estimate the conditional probability P{Xn+2 < 0|X„ = 0.5) in the AR(1) model 
Xi — -ffXi-i + Si with 1? = 0.5 for sample sizes n = 50, 100,200. The innovation distribu- 
tions are the standard normal distribution, the logistic distribution and the f-distribution 
with five degrees of freedom, the latter two scaled to have variance one. As estimator of i}, 
the sample autocorrelation coefficient, was used. The standard error of a simulated MSE 
is about 1% of the MSE. 



When the latter double integral is difficult to calculate, it can be approxi- 
mated by Riemann sums, resulting in 

^ n n N N 

, ,.s2 X! X] 5Z X] WiWjq{£j + bnUs + r^{ei + hnUt + r^{x)))k{us)k{ut). 

\^^^) i=lj=ls=lt=l 

Here ui, . . . , ujv denote the midpoints of a partition of the compact support 
[—1,1] of the kernel k into N intervals of equal lengths. This shows that the 
smoothed estimator is easy to compute. 

Weighting can lead to drastic variance reductions, especially if q is asym- 
metric, for example, for odd moments and for distribution functions. See Ex- 
ample 3.2 and Example 5.5, which treat smoothed and weighted von Mises 
statistics in the classical autoregressive model of order one. Example 3.2 re- 
ports a possible variance reduction of up to 64% for the one-lag conditional 
distribution function. Similar improvements through weighting are obtained 
for estimators of expectations under the innovation distribution; see [18], 
Sections 4 and 5. Example 5.5 shows that variance reductions of over 98% 
are possible in the case of estimating the lag-two conditional distribution 
function. The simulation results in Table 1 show that, for small to moderate 
sample sizes, the actual variance reductions might be even larger due to the 
second-order effect of smoothing. 
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It is the purpose of this paper to extend and sharpen the results on 
smoothed and weighted von Mises statistics outhned above in several di- 
rections: to linear and nonlinear autoregressive models of higher order, to 
conditional expectations with higher lags, to functions q of more than one ar- 
gument and to uniform results over classes of functions. We are particularly 
interested in estimating univariate and multivariate conditional distribution 
functions. They give rise to n^/^-consistent estimators of conditional quan- 
tiles. Other applications are conditional probabilities of staying in a certain 
band, for example, P{\Xn+i — x\ < ci, \Xn+2 — 2;| < C2\Xn = x), or condi- 
tional probabilities that the time series increases over a certain period, for 
example, P(X„+3 > Xn+2 > Xn+i > x\Xn = x). 

Specifically, we consider linear or nonlinear autoregressive models of order 

(1.3) Xi = r^(Xi_i)+ei, 

with Xj_i = . . . , Xi-i) and •& a (i-dimensional parameter, and we con- 

struct estimators for conditional expectations £^(g(X„+i, . . . ,X„+m)|X„ = 
x) for some known function q of m arguments and some fixed vector x = 
Using the representation of the autoregressive process, such 
conditional expectations can be written 

E{q{Xn+i, ■ ■ ■ ,X„+m)|X„ = x) = E[q{g^{en+i, ■ ■ ■ ,en+m))] 

for some function q,^. For lag two, that is, m = 2, we have Q^{£i,£2) = (ei + 
r^(x),e2 + r^{x2, ■ ■ ■ ,Xp,ei + r^(x))). Let he a n-^/^-consistent estimator 
of t}. Using it, we can form the residuals = Xj — r^(Xj_i), i = 1, . . . ,n. We 
estimate the conditional expectations by the smoothed and weighted von 
Mises statistic 

» m 

J QiQ^ivi, ■ ■ ■ ,ym))Y[ fwiVj) dyj. 
i=i 

It is efficient if an efficient estimator -d for ■!? is used. We obtain n^/^- 
consistency and asymptotic normality not just for fixed q, but uniformly 
over large classes of functions. We show, in particular, that our estima- 
tor, viewed as a stochastic process indexed by q and suitably standardized, 
converges to a Gaussian process. This is in contrast to the usual kernel esti- 
mators, for which limit theorems can hold only locally, in intervals shrinking 
in proportion to the bandwidth 6„. 

Independence of innovations has recently also been exploited for other 
functionals. Schick and Wefelmeyer [33] use this idea to reduce the variance 
in estimating linear functionals of the stationary law of invertible linear 
processes. Saavedra and Cao [31] obtain a n^/^-consistent estimator for the 
stationary density of an MA(1) process. Schick and Wefelmeyer [34] prove 
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asymptotic efficiency of a modified version and Schick and Wefelmeyer [35] 
obtain functional central limit theorems in the case of MA{q) processes, 
considering the density as an element of the function space Li or Cq. For 
nonparametric regression, van Keilegom and Veraverbeke [41, 42] and Van 
Keilegom, Akritas and Veraverbeke [40] exploit independence of the error 
and the covariate to obtain improved estimators for the conditional density, 
distribution function and hazard rate of the response given the covariate. 

The paper is organized as In Section 2 we derive a stochastic expansion 
for smoothed von Mises processes based on residuals, 

/„ m 
■■■ Hyi^---iym)Y[Kyj)dyj, 

and for weighted versions ip{h,fyj), uniform over appropriate classes TC of 
functions h. These are results of independent interest. To describe them, 
let f{y) = ^J2i=i^bn{y ~ be the kernel estimator based on the actual 
innovations, and 

h{y) = E{h{ei,...,era)\ei = y) H ^ E{h{ei, . . . ,eni)\eni = y). 

The expansion of ^{h, f) is of the form 

^P{h, f) - i,{h, f) = J h{y){f{y) - f{y)) dy + D^hf - ^) + Rn{h), 

with suY>h(iH\Rn{h)\ =0p(n-i/2). Here D{h) = E[h{e)l{e)]E[r^{X)], where 
I = —f'/f is the score function for location of the innovation distribution, 
r^(X) is the gradient of r^(X) with respect to t9 and (X, e) is short for 
(Xo,ei). The expansion of the weighted version differs as 

L) = Hh, f) - ^^^^ mmVi^ - + RnM, 

where again sup/jg-^^ \Rnw{h)\ = Op{n~^/'^). In the above expansions, the terms 
involving {} — -d come from replacing the estimated innovations by the true 
ones. Note that {v}/"^ J h{y){f{y) — f{y)) dy -.hG TC} is a smoothed empirical 
process. Such processes have been studied by Yukich [46], van der Vaart [38], 
Rost [25] and Radulovic and Wegkamp [21, 22]. They give conditions under 
which the smoothed empirical process is asymptotically equivalent to the 
usual empirical process. We refer to the book by van der Vaart and Well- 
ner [39] for a general overview of empirical processes. We have an envelope 
and Lebesgue densities and give, in Propositions 2.1 and 2.2, versions of the 
results of van der Vaart [38] and Rost [25] with simpler assumptions. To- 
gether with the above expansions, these results imply that if t9 is asymptot- 
ically linear, then so are the von Mises process {n^/^(^(/i, /) — ip{h, /)) : h G 



8 



U. U. MULLER, A. SCHICK AND W. WEFELMEYER 



H} and its weighted version {n^^'^{'ip{h, fyj) — ip{h, /)) : h G H}. This imphes 
that these processes converge weakly to tight Gaussian processes. For our 
apphcations to estimation of conditional expectations, we need versions in 
which the function h is indexed by and q. We formulate such results in 
Theorem 2.2. 

In Sections 3 to 5 we apply our results on von Mises processes to estima- 
tion of conditional expectations of lags one and two. We get by with mild 
assumptions on the innovation density and the autoregression function. In 
particular, we cover discontinuous autoregression functions such as those 
appearing in self-exciting threshold autoregressive (SETAR) models. Higher 
lags can be treated along these lines, but the stochastic expansions of the 
estimators are notationally cumbersome. In particular. Theorem 3.1 special- 
izes Theorem 2.2 to the case of estimating conditional expectations of lag 
one. In Theorems 3.2 and 3.3 we apply Theorem 3.1 to estimators for con- 
ditional distribution functions and for the conditional expectation of a fixed 
function q. Theorems 4.1, 5.1 and 5.2 give analogous results for conditional 
expectations and conditional distribution functions of lag two. Examples 3.1 
and 5.4 apply these results to conditional quantile processes of lags one and 
two. Our results are new and nontrivial, even for the linear autoregressive 
model of order one. 

In Section 6 we show that the weighted versions of our estimators are 
efficient if an efficient estimator for is used. This is done by checking 
that the influence function then equals the efficient influence function for 
estimating il^{h^,f) with h^ = qo g^. Efficient estimators for ?? in nonlinear 
autoregression with mean zero innovations are constructed in [14]. 

Section 7 contains two technical lemmas. Lemma 7.1 gives a characteri- 
zation of compact subsets of L2{i') for measures u with Lebesgue density. It 
says that a closed subset of L2{v) with an envelope translation-continuous 
at zero is compact if and only if the subset is equi-translation-continuous 
at zero. Lemma 7.2 gives conditions for uniform differentiability of integrals 
with respect to Hellinger differentiable densities. 

2. Smoothed and weighted von Mises processes of residuals. Consider 
observations . . . ,Xn from a stationary and ergodic nonlinear autore- 

gressive process Xi = r^(Xj_i) + ej of order p, where Xj_i = . . . 

and is a (i-dimensional parameter. Assume that the innovations £i are i.i.d. 
with mean zero, finite variance o"^ and positive density / and are independent 
of Xq. Let ?9 be a n^/^-consistent estimator for '&. Estimate the innovations 
£i by residuals = Xj — r^(Xj_i) and the innovation density / by the kernel 
estimator 

1 " 

^ i=i 
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1 " 

fw{y) = -^Wikb^{y-ei), 



1=1 



where kb^{y) = k{y/bn)/bn for a kernel k and a bandwidth 6„. Following 
Owen [19, 20], we choose positive weights Wi of the form 

1 



Wi 



1 + Aei 



where A is chosen such that J2i=i ^iii = 0. By Miiller, Schick and Wefelmeyer 
[18], this is possible with probability tending to one. When there is no solu- 
tion, we set A = 0. 

In this section we obtain a uniform stochastic expansion for smoothed 
von Mises processes based on residuals £i, . . . ,£n, 

/„ m 
■■■ Kyii---iym)'[lKyj)dyj, 

and their weighted versions ip{h, f^). Here the index h runs through a family 
TC of functions from M'" to M with envelope H, that is, \h\ < H for all h£TC. 
We assume the envelope to be of the form 

(2.1) H{y^,...,y^) = V{yi)---V{ym), 

where ^ is a measurable function satisfying the following conditions. 

Assumption V. The function V satisfies V >l and, for some q > 1, 
lil + \y\rv\y)f{y)dy<oo. 
Moreover, the function D defined by 

ym V[y) 
is bounded on compacts and is continuous at 0, 

(2.2) D{s)^0 ass^O. 

If y = 1, then Assumption V is satisfied with a = 2. Another example of 
a function satisfying Assumption V is V{y) = (1 + |y|)''' with 7 > 0, provided 
/ |y|^'''^"/(y) dy is finite for some a > 1. 

Write e and X for random variables with the same joint distribution as 
Ei and Xj_i. Denote the distribution functions of e and X by -F and G. 
We make the following assumptions on the density / and the autoregression 
function r^. 
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Assumption F. The density / has finite Fisher information for location, 
that is, / is absolutely continuous with almost everywhere derivative /', and 
E[f{e)]=je'^dF is finite, where £ = -/'//. 

Assumption R. The function r i-^ ^^-(x) is continuously differentiable 
for all X with gradient r i— > rr(x). For each constant C, 

n 

sup ^(r^(Xi_i) - r^(Xi_i) - r^(Xi_i)'^(T - 

(2.3) 

= 0,(n-2/3). 

Moreover, E[\r^(X)\^/^] = J \r,^\^/^ dG <oo and the matrix £;[r^(X)r.^)(X)^] = 
/ '^■ff'f'^ dG is positive definite. 

A sufficient condition for (2.3) is a Holder condition with exponent 2/3 
on the gradient r,-, 

|r,(x)-r^(x)|<|T-t?|2/3A(x), 

with ^GL2(G). 

Finally, we impose the following assumptions on the kernel and the band- 
width. Recall that d is the dimension of the parameter 

Assumption K. The kernel k is a symmetric and twice continuously 
differentiable density with compact support [—1,1]. 

Assumption B. The bandwidth 6„ satisfies nbf^ and n6^* oo with 
4 = (50 + 20d)/(14 + 5d). 

The requirement on the bandwidth is satisfied by 6„ ~ for any /5 
satisfying 1/4 < (3 < l/d:^. Another possibility is ~ (?T. log(/7')) ' . 

In Theorem 2.1 below we describe expansions of ■0(/i, /) and ipih, fw)- For 
this, we define, for /i G a function h = hi + --- + hm by 

^jiyj) = J ■■■ J Kyii---,ym) H f{yk)dyk- 

Note that hj{ej) = E{h{ei, . . . ,em)\£j)- For a measurable function g, we de- 
fine the y-norm by 

\\9\\v= J V{y)\g{y)\ dy. 

It follows from Assumptions V and F that /' has finite l/-norm. Indeed, one 
has 

Wffv = iE[V{emem' < E[V\e)]E[f{e)]. 
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Recall that 



n ^ 



i=l 



denotes the kernel density estimator based on the true innovations. For 
g£L2{F), set 



B{g)=E[g{e)£{e)]E[r^{X)], 
Un{g)= / 9{y)f{y)dy - - J^gisi 



1=1 



let g* denote the projection of g onto the subspace {v £ L2{F) : J v{y)f{y) dy ■ 
and let g"^ denote the projection of g onto the subspace 



V £ L2{F) : / v{y)f{y) dy = yv{y)f{y) dy = 



V 

It is easy to check that g*{y) = g{y) — E[g{ey\ and 

9*{y) = 9{y) - E[g{e)] - a-^E[eg{e)] y, y £R. 

Since E{e{e)] = and E[ee{e)] = 1, we have £*{£) = £{e) - a-^e. Note that 
E[g{e)i*ie)]=E[g*{e)i{e)]. Also, E[g* {e)i{e)] = E[gie)i{e)] and Big*) = 

Big). 

Our expansions rely on the following lemma which summarizes results of 
Miiller, Schick and Wefelmeyer [18], namely, their Theorems 3.1-3.3. 

Lemma 2.1. Suppose Assumptions B, F, K, R and V hold. Then \\f — 
f\\v = Opin~^/^) and ||/- / - /'i?[r^(X)]"^(i9 - = Op(n-i/2). Moreover, 
||/t«-/||y = Op(n-i/^) ^^^^ with ^iy) = yfiy), 



n ^ 



i=l 



V 



Theorem 2.1. Suppose Assumptions B, F, K, R and V hold. Then 

r X — > 

n 



sup 

hen 



sup 



^l^ih, f) - i;ih, /) - - E h*ie,) + Bih*)'^i^ -^)- Unih) 
= 0p(n-V2); 

1 " - 

i^{h, U) - /) - - E + Bih#)^i^ -^)- Unih) 



o,in-'/'). 
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Proof. We prove only the second conclusion. For a subset ^ of {1, ... , m}, 

let 

(t'Aiy) = n Ylif-^ivj) - fivj))^ y = {yi,---,yni)- 

Setting ipr{y) = T.\A\=r(t>A{y), we have 

m m m 

n = n (/(%■) + Uvi) - fivj)) = E = E My)- 

i=l i=l Ac{l,...,m} r=0 

Note that 

m mm 

v'oiy) = n-/'(yi) fiiy) = J2(f'^^yi)~ f(yi)'>]lf(yk)■ 
j=l j=l k^j 

Thus, 



Hy)'Po{y) dy = il^{h, f) and / /i(y)v3i(y) c^y = / h{y){fw{y)- f{y))dy. 
Using (2.1), we obtain 



E 

r=2 



Ky)^r{y)dy 



m „ m 

<E/^(y)lv'r(y)|rfy = E 

r=2-' r=2 



Since ||/^ — f\\v = Op{n ^^^), we obtain 



sup 



\\L-f\\v\\f\\ 



m—r 

V ■ 



Note that \h\ < C^V with C„ 



m— 1 



. Thus, by the last assertion of 



Lemma 2.1, sup^g.^ = Op{n ^^'^), where 

Rnih)= J %)(/»(2/)-/(2/) + a-2e(y)^Ee, 

\ i=l 

+ i*{y)f{y)EMX)]'^{^-^)^dy. 

Since E[h{e)£*{e)]E[r^(X.)] = B{h*), the desired result follows. □ 

In order to obtain functional central limit theorems for the smoothed von 
Mises process /) — 'ip{h,f)):h G H} based on the residuals and 

for its weighted version, we can now apply results on smoothed empirical 
processes {n-^/^ / g{y){f{y) — f{y)) dy.gGG} based on the innovations. This 
also requires an estimator •& that is asymptotically linear in the sense that 



(2.4) 



1 



^ = ^ + - E ^i^i-i^^i) + Op(n-^/2) 
n ^ 



i=l 
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with influence function if (X., e) satisfying ^'((^(X, e)|X) = and £'[|(/3(X, e)p] < 
oo. Typically, if is orthogonal to V in the sense that E[ip(X.,e)v{e)] = for 
all veV. 

In the literature one decomposes n^/^ / g{y){f {y) — f {y)) dy into a variance 
term 

n^'^ j g{y){f{y)-f*hM)dy 

and a bias term 



One assumes that the bias term tends to zero uniformly in 



(2.5) sup 

g&Q 



n'/^ 9{y){f*hAy)-f{y))dy 



0. 



Sufficient conditions for this analytic property are easily given in terms of 
smoothness of / and an appropriate bandwidth 6„. For example, (2.5) holds 
if nbf^ — > and 



sup 



9iy)if{y-s)-fiy) + sf{y))dy 



0{s'). 



To deal with the variance term, van der Vaart ([38], (1-1)) and Rost ([25], 
(2.7)) use a condition that in our case is 

(2.6) sup j l^j {g{y + bnu) - g{y))k{u) dv^ f{y) dy ^ 0. 

van der Vaart [38] shows that if Q is Donsker and translation invariant, 
then conditions (2.5) and (2.6) imply that the smoothed empirical process 
converges weakly in -^oo(^) fc) a tight Brownian bridge process. Inspection of 
his proof shows that we can remove translation invariance if we strengthen 
Q being Donsker to Qrj = {gi' + ^ V^g ^ being Donsker for some 
r]>0. 

Suppose now that G has an envelope V G L2{F) satisfying 

(2.7) J{V{y + s)-V{y)ff{y)dy^O ass^O. 

Then condition (2.6) holds if Q is totally bounded in L2{F). This follows 
from the characterization of compact subsets of L2{v) for finite measures v 
with Lebesgue density given in Lemma 7.1. If ^ is Donsker, then Q is totally 
bounded in L2{F) and, hence, condition (2.6) holds. We therefore obtain 
the following version of the Theorem in [38]. 
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Proposition 2.1. Suppose is Donsker for some r] > and has en- 
velope V G L2{F) satisfying condition (2.7). Then condition (2.5) implies 
that 



(2.8) sup|C/„(5')| =sup 
g€G g&Q 



f - 1 

/ 9{y)f{y) dy--^9 
1=1 



Si 



Op{n 



-1/2n 



and that the smoothed empirical process converges weakly in ioo{0) io a tight 
Brownian bridge process. 



One can derive from [25] that Qj^ Donsker can be replaced by the condition 
that Q has uniformly integrable L2-entropy. In his Theorem 2.2, Rost uses 
condition (2.5) with Q replaced by ^ U {y^} and (2.6). Since G is totally 
bounded in L2{F) if it has uniformly integrable L2-entropy, condition (2.6) 
is implied by (2.7). Condition (2.5) with Q = {V^} is used only to conclude 
that / V{e + bnu)k{u) du is uniformly integrable. But the latter follows from 
condition (2.7). Hence, we have the following version of Rost's Theorem 2.2. 



Proposition 2.2. If Q has uniformly integrable L2-entropy and enve- 
lope V G L2{F) satisfying (2.7), then condition (2.5) implies (2.8) and the 
smoothed empirical process converges weakly in -^oo(^) ^0 ^ tight Brownian 
bridge process. 



We can now combine Theorem 2.1 and Proposition 2.1 to obtain func- 
tional central limit theorems for the von Mises statistics ip{h, f ) and '(/'(/j, fw)- 
We consider only the weighted version, ip{h, fyf). Assume that -d is asymptot- 
ically linear in the sense of (2.4), with influence function ip orthogonal to V. 
By Theorem 2.1 and Proposition 2.1, ■ip{h,fu,) is uniformly asymptotically 
linear, 



sup 

h&H 



1 



ij{hjw) -Tp{hJ) Vs/j(Xi_i,ei) 

n ^ 



i=l 



0p{n 



with influence function s/i(X,e) = /i*(e) — B(Jv^)^ ipil^^e). 

It follows that {n}/'^{il}{h, fu,) — ip{h,f)):h£ 7i} converges weakly in i, 
to a centered Gaussian process with covariance function 

Cov(/i,fc) = E[s;,(X,e)sfc(X,e)] 

= E[h*{e)k*{e)] + B{h*f E[^{X,e)^{X,ey]B{k*). 

We have 

E[h*{e)k*{e)] = E[h{e)k{e)] - E[h{e)]E[k{e)] - a-^E[eh{e)]E[ek{e) 
B{h*) = {E[h{e)£{e)] - a-^E[eh{e)])E[r4X)]. 
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A n^/^-consistent estimator of the covariance function is obtained using 
residual-based empirical estimators for £'[/i*(e)A:^(e)] and B(k^) and an 
appropriate estimator of the asymptotic variance of "d. Note that the term 
of the form E[h[e)i{e)] could be written ds=QE[h[e + s)], so estimation of ^ 
could be avoided. 

In our applications to estimation of distribution functions and conditional 
expectations, the class 7i consists of functions that may depend on '& and 
other parameters. To treat the different cases economically, we now formu- 
late a version of Theorem 2.1 for such classes. Suppose that 7i is of the 
form 

W* = {/i,,g:|r-??| < A,gG Q} 
for some index set Q, and set Ti* = {h:h £ 7i*}. 

Theorem 2.2. Suppose that Assumptions B, F, K and R hold, that 7i* 
has envelope H of the form H{yi, . . . , ym) = V{yi) ■ ■ ■ V(ym) with V satisfy- 
ing Assumption V, and that 



(2.9) 



sup 



V^(/i,,g,/)-V(/i^,„/)-^T (r-^?) 



o{\t-^\) 



for some vector ^,9,5. Let 7i* = {h{- + s):\s\ < r],h £ 7i*} be Donsker for 
some r] >0; 



(2.10) sup / {hr,q{y) - h^^q{y)Yf{y) dy^O as t ^ ^\ 



(2.11) 



sup sup 

|r-i?|<AgeS 



hT,q{y){f{y 



fiy) + sf'iy))dy 



Set D^^ . 



hq - B{hl g) and D^ ^ - - -v-^,g 



* -^^^g-BChfj- Then 



sup 



1 " 
1=1 



Op{n 



-1/2N 



and 



sup 

q€Q 



i=l 



Op(n-V2). 



In particular, if -d is asymptotically linear with influence function ip orthog- 
onal to V, then the process {n^/'^{ij;{h^ ^, /^) — 'ip{h^^q, /)) :q £ Q} converges 
weakly ifi ■^cxd(S) ^0 d ccTitcrcd GaussiciTi process with covdvicLTice fufictioTi 

Cov(p,g) = E[h*^{e)htje)] + {D*j'^ E[y,{X,eMX,ey]D*^. 
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Proof. We prove only the second expansion. It follows from (2.9) and 
the n-^/^-consistency of '& that 

sup \il:{h^ f) - ij{h^^„ f) - -n= Opin~'/^). 

geQ 

It follows from (2.10) that 

sup \B{h* 



-B{h*J=o,{l). 

Since TC* is Donsker, we obtain from (2.10) that 

{hJy)-h^Ay))d{F{y)-Fiy)) = 



sup 



Op{n 



-1/2N 



with F{y) = ^J^i=il[si < y]. Since H* is Donsker and (2.11) holds, we 
obtain from Proposition 2.1 that 



sup 



h^^^{y)f{y)dy- I h^Jy)dFiy) 



Op(n-V2). 



The desired result now follows from Theorem 2.1. □ 

A sufficient condition for (2.11) is ||/(- - s) - / + sf\\v = ©(s^). This 
holds, for example, if ||/'(- — s) — f'\\v = 0{s). In particular, it holds if /' is 
absolutely continuous with H/"!!;/ finite. 

Also of interest is the case when 7i = {hq : q & Q}. In this case, the as- 
sumptions of Theorem 2.2 simplify considerably. 

Corollary 2.1. Suppose that Assumptions B, F, K and R hold and 
TC = {hg:q€Q} has envelope H of the form H{yi,. . . , ym) = V{yi) ■ ■ ■ V{ym) 
with V satisfying Assumption V. Let Tirj = {hq{- + s):\s\ < r],q £ Q} be 
Donsker for some rj > and 



(2.12) 

Then 



sup 



hqiy)ifiy-s)-fiy) + ^f'iy))dy 



0{s\ 



sup 



sup 



1 " - 

^l:{hqj) - ijihq, f)--J2 K(^^) " " ^) 

Tl . 
1=1 

1 " - 

V(^, /») - ^l^ihq,f) - - ^ h*{e,) - B{h*)'{^ - {}) 



1=1 



Op{n 



Op{n 



-1/2V 



-1/2n 



In particular, if i) is asymptotically linear with influence function ip orthog- 
onal to V, then the process {n^/^(^(/iq, /t„) — ip{hq, f)) -.q G Q} converges 
weakly in ■^cxd(S) ^0 a ccTitcTcd GaussidTi procGSS with covaviciTicc fufictioTi 

Cov{p,q) = E[h*{e)h*{e)] + B{h*)^E[^{X,eMX,e)'']Bihf). 
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3. Conditional expectations of lag one. Let Q be a family of functions 
from M to M. For q G Q, the conditional expectation E{q{Xn+i)\^n = x) 
can be written as i'{'&.,q) = E[q{e + r^^x))]. We estimate v{'&,q) by 



j{y + rjx))f{y)dy 



and 



i>wiq) = j q{.y + r^{y.))fw{y)dy. 



Theorem 3.1. Suppose Qrj = {q'(- + r^(x) + s) : |s| < ?7,g e Q} is Donsker 
for some rj> and has an envelope V that satisfies Assumption V. Suppose 
f has finite Fisher information for location and satisfies 



(3.1) sup sup 

\t\<riqeQ 



q{y + r,(x) + t){f{y - s) - f{y) + sf{y)) dy 
Let Assumptions B, K and R hold. Then 



sup 



sup 



1 

Kq)--T.<l^e, + r^{-x))-Dl{^-{}) 

n ^ ^ 

1=1 

1 " 



Op(n-V2), 



where Dg = E[q{e + r.^ {x))i{e)]{U (x) - E[r^(X)]) and Dq = Dg + CqE[r^ (X)] 
with Cq = a~'^E[eq{e + r^(x))]. 

In particular, if is asymptotically linear with influence function 93 or- 
thogonal to V, then the process {n^/2(z>^(g) — E[q{e + r^(x))]) € Q} con- 
verges weakly in loo{Q.) to a centered Gaussian process with covariance 
function 

Cov{p, q) = E\p{e + r^(x))g(e + r^(x))] - E\p{e + r4x))]E[q{e + r^(x))] 
- a^cpcg + dJe[^{X, e)ipiX, ey]Dg. 

Proof. We apply Theorem 2.2 with 

n* =n* ={qi-+rrix)):\T-^\<A,qeQ} 

and some small positive A. In view of Assumption R, we can take A suffi- 
ciently small for W,*^2 be contained in Qg. Thus, condition (2.11) is implied 
by (3.1). Since 

iq{y + r^(x)) - q{y + (x)))/(y) dy 



g(y + r^(x))(/(y- (r^(x) -r^(x))) - f{y))dy, 
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it follows from (3.1), differentiability of r i-^ ?'t(x) at and finiteness of 
that condition (2.9) holds for the present 7i* with 

= E[q{e + r^(x))£(e)]r^(x). 

As ^ = {q{- + r^(x)) -.q £ Q} is totally bounded and has an envelope V 
that satisfies (2.7), condition (2.10) is met by the compactness criterion 
in Lemma 7.1. □ 



The conditional distribution function of Xn+i, given = x, can be writ- 
ten t*-^ F{t — r^(x)). We can estimate it by Fgit — r^(x)) or Fsw{t — r^(x)), 
where Fg and Fgw are the distribution functions corresponding to / and fw, 
respectively. The corresponding class Q is {l(_oo,t] : t G K}; it is Donsker and 
translation invariant. Its envelope is V = 1, which satisfies Assumption V. 
Here the left-hand side of (3.1) becomes supgg]g|F(t — s) — F{t) + sf{t)\. 
Thus, (3.1) holds if / is Lipschitz. Hence, Theorem 3.1 implies the following 
result. 



Theorem 3.2. Suppose Assumptions B, K and R hold. Let f be Lips- 
chitz and have finite Fisher information for location. Then 



sup 



sup 



1 " 

Fs{t - r^(x)) - - ^ l[e, < t - r^x)] - dJ{^ - ^) 
= 0p(n-i/2). 

1 

F,^(t - r^(x)) - - V(lh < t - r^(x)] - ctSi) - dJ{^ - i9) 



i=l 



Op{n 



-1/2n 



w/iere = — /(t — r^(x))(r^(x) — £'[r^(X)]) and Dt = Dt + CtE[r^{yi)\ with 

ft-r^(x) 



Ct = 



yf{y)dy. 



In particular, if -d is asymptotically linear with influence function if or- 
thogonal to V, then the process {n^^'^{Fsw{t — r^(x)) — F{t — r^(x))) :t S M} 
converges weakly in -£qq(M) to a centered Gaussian process with covariance 
function 

Cov(s, t) = F{{s - r^(x)) A (t - r^(x))) - F{s - r^(x))F(t - r^(x)) 
- a^csct + DjE[^{X, e)^(X, e)^] A- 

Example 3.1. For < u < 1, let ipuiG) = G^^n) = inf{t:G(t) > u} 
denote the left-inverse of a distribution function G at u. The conditional 
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u-quantile of given X„ = x, is Tpu{F{- — r^(x)) = F~^{u) + r^(x). We 

can estimate it by tpu{Fswi- — '''^(x))) = F^{u) + r^(x). Let < c < d < 1. 
Recall that we assumed that the density / is positive. Thus, by Proposition 
1 of [10] on compact differentiability of quantile functions, we obtain the 
uniform stochastic expansion 



sup 



F~^\n) + r^{^)-{F~Hu)+M^)) 
1 1 



+ 



+ 



f[F-\u))n 



1 + 



Y,{l[e^<F'\u)\-u-auei) 



1=1 



fiF-Hu)) 



: Op{n 



-1/2N 



with 



yfiy)dy. 



It follows that the smoothed and weighted conditional quantile process 

{n'/\F-J{u) + r^{^) - {F-Hu)+r4^))):ue[c,d]} 
converges weakly in -^oo([C)'^]) to a centered Gaussian process. 

Example 3.2. Consider the classical AR(1) model Xi = + ej with 

m < 1. It satisfies Condition R with r^(x) = x and E[r,ff{X)] = 0. A natural 
estimator for -d is the least squares estimator iD, which has expansion 

(3.2) ^ = ^Jr„^^;-^^- = ^ + - E ^X,.,e, + o,(n-V2). 



En y2 



n 



Fix t and x in M. For the estimator of the conditional distribution function 
at t of given X„ = x, we obtain 



1 



Fs{t-^x) = -Y^[\{Ei<t- dx\ - xf{t - dx) 
in . 1 \ 



l-7?2 



j=l 



+ Op{n 



-Xi-iEi 



-1/2N 



n 



i=l 



FsUt - ^x) = -Y.l'^h <t - ^x] - ctEi -xfit- -dx) 



l-??2 



+ Op(n 



It follows that n^^'^{Fs(t — ^x) — F(t — "dx)) is asymptotically normal with 
mean zero and variance = F{t — 'dx){\ — F{t — 'dx)) + x'^f'^{t — 'dx){l — t?^). 
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while Tn}/'^{Fsw{t — ^x) — F{t — 'dx)) is asymptotically normal with mean 
zero and variance — a'^cf. Thus, weighting results in a smaller asymptotic 
variance. For t = x = and / the standard normal density, the asymptotic 
variances are 1/4 and 1/4 — l/(27r) ~ 0.0908. In this case weighting reduces 
the asymptotic variance by about 64%. 

Now consider the case where Q consists of one element q. The correspond- 
ing class Gri equals {q{- + r^(x) + s) : \s\ < rj}. Assume now that / has a finite 
absolute moment of order greater than 27 + 1 and that q satisfies the growth 
condition 

(3.3) k(y)|<(l + |y|)^ ye IK, 
and the Lipschitz condition 

(3.4) \q{y + si)-q{y + S2)\<L\si-S2\{l + \y\)\ y G M, 

for si, S2 in a neighborhood of r^(x). Then has envelope V of the form 
V{y) = K{1 + lyl)''', which satisfies Assumption V. Also, Qri is Donsker. This 
follows since the bracketing numbers N[.-j(5,Qn, L2{F)) are of order 1/6; take 
brackets of the form q{- + r^(x) + sj) =F c6V. The left-hand side of (3.1) now 
becomes sup|^|^^ | A^^^l with 

^s,t = Jq{y + r^x) + t){f{y - s) - f{y) + s/'(y)) dy. 
We can write 

^s,t = -J q{y + r4x)+t) s{f{y-us)-f'{y))dudy 

lol ^'^^y ^ + t + us) -q{y + r^{x) +t))f{y)dydu. 

By the Lipschitz property of q and the finiteness of ||/'||\/ under Assump- 
tion F, we obtain sup|(|<^ \^s,t\ = O(s^), which is (3.1). Thus, Theorem 3.1 
implies the following result. 

Theorem 3.3. Suppose Assumptions B, K and R hold, q satisfies (3.3) 
and (3.4) and f has finite Fisher information for location and finite absolute 
moment of order greater than 27 + 1 . Then 




q{y + r^(x))/(y) = - ^ q{ei + r^x)) + Z)J(^ - ^9) + Op{n-^/^); 

1=1 

I 

iy + r^(x))/^(y) dy = -J^ili^i + M^)) " Cg^i) + DJ - i?) 



with Cq, Dq and Dq as in Theorem 3.1. 
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Theorem 3.3 can be used to estimate conditional moments and abso- 
lute moments of lag one. For example, to treat estimation of the condi- 
tional 7th absolute moment i?(|X„+i|^|X„ = x) with 7 > 1, take q{y) = \y\'~' . 
Our estimators are J \y + r^{x.)\'~' f{y) dy and its weighted version J \y + 
r^{yiW fw{y)dy. 

4. Conditional expectations of lag two. Let Q be a family of functions 
from to M. For q £ Q, the conditional expectation 

E{q{Xn+l,Xn+2)\^n=^) 

can be written 

v{^,q) = E[q{Qi){ei,£2))]= j j q{Q^{y, z))f{y)f{z) dy dz 

with 

Qi}{y,z) = (y + r^(x),2; + r^(x_i,y + r^(x))), 
where x_i = (x2, . • • , Xp). We estimate ui^, q) by 

^{<l) = j j QiQ^{y,z))fiy)f{z)dydz 

and its weighted version 

i>w{q) = j j q{Qiiy,z))fw{y)fwiz)dydz. 

We shall apply Theorem 2.2 to obtain stochastic expansions for these esti- 
mators. 
We have 

hT,q{y, z) = qiQriy, z)) and hr,q = h^^l + 

with 

Wl{y)= i (l{QT{y,u))f{u)du and hfl{z) = j q{Qr{u,z))f{u)du. 



.72 



To get an envelope for the class TL* = {/it,^ :|t — -i?! <A,gG Q}, we assume 
that Q has an envelope Vq of the form 

(4.1) Vq{xi,X2) = Cq{1 + \xi\Y<^{l + \X2\ 



for some finite constant Cq and nonnegative exponents 71 and 72, and im- 
pose the following growth condition on the autoregression functions: for some 
constant A, 

(4.2) |r,(ui,...,up)| <^|^l + ^Juj|^, |t-i?|<A. 
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Such a growth condition is typically needed for ergodicity of the model; 
see [4, 5] and [1]. There is then a constant C'q such that 

\q{er{y,z))\<CQ{l + \y + rA^)\ni + \z + rA^^i,y + M^))\r 

(4.3) <C^(l+|y|)7l(l + |^| + |y|)72 

<c'Q{i + \y\y^+^%i + \z\y\ 

Thus, TC* has an envelope H of the form H{y, z) = V{y)V{z) with V{y) = 
K{1 + lyl)'^^^'''^. This V satisfies Assumption V if / has finite absolute mo- 
ment of order greater than 271 + 272 + 1. We can now use the special struc- 
ture of h-r^q to show that (2.9) holds. 

Lemma 4.1. Let Q have envelope Vq of the form (4.1). Suppose that 
f has finite Fisher information for location and finite absolute moment of 
order greater than 271 + 272 + 1. Suppose Assumption R and the growth 
condition (4.2) hold and that 



(4.4) 

Then 



sup 

geQ 



(r^+t(x_i,y) -r^(x_i,y) - r^{x_i,y)'^t) f{y-r^{x))dy = o{\t\'^). 



{<l{Q&+t{y, z)) - q{g^{y, z)) - q{g^{y, z))x{y, z)^ t) f {y) f [z) dydz 



where xiy^ = i{y)r^{x) + i{z)r^{x^i,y + r^{x)) . Thus, condition (2.9) 
holds with 



^^,q= I I <l{Q^{y^z))x{y,z)f{y)f{z)dydz 

[Q^{y,z)) 

X {f'{y)f{z)r^{x) + /(y)/'(z)r4x_i,y + r^(x))) dydz. 

Proof. It is easy to check that gr{£:i,£2) has a density pr with respect 
to the Lebesgue measure A2 on M? of the form 

(4.5) pr{y,z) = f{y - rr{x))f{z - rr{x_i,y)). 
We can write the integral in the assertion as 

(4.6) / / q{y,z){p^+tiy,z) -p^{y,z) - xiy, z^ tp^{y, z)) dy dz, 
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where x is the score function at •& of the parametric model V = {pr : |r — < 
A}: 

(4.7) x{y,z) =^(?/-r^(x))r^(x) + £(z - r^(x_i, y))r^(x_i, y). 

ActuaUy, % is the Helhnger derivative of this model at "d. Indeed, since / has 
finite Fisher information for location, the model {/(. — rT-(x)) : |r — "!?! < A} 
is Hellinger differentiable at with Hellinger derivative r^(x)£(. — r^(x)) and 
this and (4.4) yield the Hellinger differentiability of at with Hellinger 
derivative x'l see Proposition A. 6 in [13]. It is easy to check that / VqPt d\2 — > 
/ ^oP-d as T — > ■!?. Thus, Lemma 7.2 yields the desired result. □ 

We now address sufficient conditions for (2.10). 

Lemma 4.2. Suppose the assumptions of Lemma 4.1 hold. Then (2.10) 
is implied by 

(4.8) sup / {hSliy + s)- h^^\{y)ff{y) dy^O ass^O, 

sup j i^j {q{g^{y, z + Ar {y))) -q{g^{y,z)))f{y)dy^ f{z)dz^O 

(4.9) 

as T — > i9, 

where Ar{y) = r^(x„i, y + r^(x)) - r^(x_i, y + r^(x)). 

Proof. With s = rT-(x) — r^(x), we can write gr{y,z) = g^{y + s,z + 
A^iy + s)) and then 

h^r],iy)= J q{g4y + s,z))fiz- Ariy + s))dz, 

U^l{y) = J q{g4y, z + Ar{y)))f{y - s) dy. 
In view of (4.8) and (4.9), it suffices to show that, as r — > 

(4.10) sup / ihi]l{y) - h^liy + s)ffiy) dy ^ 0, 

(4.11) sup J (^J q{g4y, z + A,{y))){f{y - s) - f{y)) dy^ ' f{z) dz ^ 0. 
The Cauchy-Schwarz inequality gives 

iWiiy) - <(y + s)f = iyj q{g^{y + s, z)){f{z - A,{y + s)) - f{z))dz^ ' 
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< / q\Q4y + s,z))ifiz-Ariy + s)) + fiz))dz 

X l\f(z-AAy + s))-f{z)\dz. 

Using (4.3), we can bound the first integral on the right-hand side by 
Cq[{1 + |?/|)2ti+272 + (1 + |y + s|)2ti+272] J(i + |^|)272 j(^) dz, while the sec- 
ond integral can be bounded by \A.T-{y + Indeed, since / has finite 
Fisher information, /' is integrable and J \ f(z — v) — f{z)\dz < |f |||/'||i for 
every real v. Using these bounds, we obtain that the left-hand side of (4.10) 
is bounded for |s| < 1 by a constant times 

y'(l + |y|)27l+272|A^(y)|/(y_,)dy. 

Since |AT-(y)| < A{1 + \y\) and Ar{y) for every y, and since 

/(I + |y|)'^^+'^^+V(?/ -s)- fiy)\ dy ^ 0, 

we get (4.10). A similar argument yields (4.11). □ 

Remark 4.1. In view of the characterization of compact subsets of 
L2{F) given in Lemma 7.1, the above assumptions imply that condition 
(4.8) is equivalent to total boundedness of H'^^^ = {h^^^^:q £ Q} in L2{F). 
Consequently, (4.8) holds if Q is a finite set or if H*-^^ is Donsker. 

Let us now assume that the class Q satisfies the Lipschitz property 

(4.12) \q{yi,zi) -q{y2,Z2)\ <Li{y,z)\yi - y2\ +L2{y,z)\zi - Z2\, 
where y = |yi| V \y2\ and z = |zi| V \z2\ and where 

Li(y,z) = Ci(l + |y|)°Hl + Nir and L2{y,z) =C2{l + \y\f'{l + \z\f' 

for constants Ci, C2 and nonnegative exponents ai, Q2, /3i and /?2- Let us set 
Q = max{ai + Q2, /3i +/32}. Then we derive that, for each C, there is a C^, such 
that, for ah y, z, ah |si|, |s2|, 1*2! < C and |ai(y)|, |a2(y)| < C(l + |y|), 

\q{y + si,z + ti+ ai{y)) - q{y + 52,^ + ^2 + a2{y))\ 

< C,{\si - S2I + \ti - t2\ + \ai{y) - a2{y)\){l + \y\)H^ + Nl)^- 

With the help of this inequality, it is now easy to check that, under the 
assumptions of Lemma 4.1, the statements (4.8) and (4.9) are met, so that 
(2.10) holds by Lemma 4.2. Using 

(4.13) fiy -s)- f{y) + sf'iy) = -s [\f{y - ws) - f'{y)) dw, 

Jo 
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the left-hand side of (2.11) can be bounded by |s|(Ti(s) + T2(s)), where 



Ti{s) = sup sup sup 

0<w<l |t-i?|<A(?GS 



T2{s) = sup sup sup 

0<w<l |t-i?|<A(?GS 



{QiQriy + WS, Z)) - q{Qr{y, z))) 

xf'{y)fiz)dydz 
(.q{QT{y,z + ws))-q{gr{y,z))) 



xf{y)f{z)dydz 

Using the Lipschitz property (4.12) of Q, we see that T2{s) = 0{s) and 
Ti(s) = 0(s) + 0(r3(s)), where 

T3{s)= sup sup sup \rr{x-i,y + ws) -rr{x^i,y)\il + \y\)^\f'{y)\dy. 

0<«><1 |r-i?|<AgGS"' 

This shows that (2.11) holds if T^is) = 0{s). 

To obtain that the class 7i* is Donsker, we will impose the following 
conditions (Bl) and (B2) on Q and the class TZ defined by 

n = {r^(x_i, • + r^(x) +s):\t -'d\<A,\s\<r] + A}, 

with A = sup|^_^|<^ kT(x) — r^(x)|. Note that the growth condition (4.2) 
implies that TZ has an envelope of the form Afi{l + \y\) for some constant 
An- 

(Bl) For some integer k and every 6 > 0, there are N = Ng = 0{5^'^) 
elements gi, . . . , ^at in Q such that Q is covered by the brackets [qi — (5Vq, + 
5VQ],i = l,...,N. 

(B2) The class TZ has L2(^)-bracketing numbers of polynomial growth 
for ^{dy) = (1 + \y\)'^^ f{y) dy: For some integer j, 

N[.]{6,TZ,L2{fi))=0{6~n- 

These properties, the growth condition (4.2) and the Lipschitz property 
(4.12) of Q imply that the class G = {{y,z) q{y + s,z + t + a{y)) :a G 
"^5 l-^li N C} has L2{F x i<')-bracketing numbers with polynomial growth 
for each finite C. Indeed, for C > An, we can consider brackets of the form 

q{y + u, z + V + a{y)) ±C^{26 + |a*(y) - a^{y)\)w{y)w{z) 

± SVQ{y + u,z + v + a{y)), 

where w{x) = (1 + |a;|)'' , u and v belong to the grid {i6 : i G Z, \i6\ < B} and 
a is the midpoint of a bracket [a=i,,a*] for TZ. Since G has polynomial growth, 
so do the classes Qi = {J g{-, z)fiz) dz : g e Q} and G2 = {J giy,-)fiy) dy : g e 



26 



U. U. MULLER, A. SCHICK AND W. WEFELMEYER 



Q}. Hence, these classes are Donsker. Since subsets and sums of Donsker 
classes are Donsker, and since Ti* CG1 + G2 for large enough C, we see that 
7Y* is Donsker. Thus, we have the following result. 

Theorem 4.1. Suppose Assumptions B, K and R hold. Suppose the 
class Q has envelope Vq given by (4.1) and satisfies the Lipschitz property 
(4.12) and the growth property (Bl). Let f have finite Fisher information for 
location and a finite absolute moment of order greater than 271 + 272 + 1. 
Let TZ satisfy (B2) and let the autoregression functions satisfy the growth 
conditions (4.2), the differentiability condition (4.4) and 

sup sup / \rr{yi-i,y + s)-rr{yi-i,y)\{l + \y\f\f'{y)\dy = 0{s). 

~i9l<A aGO J 



lT-i?|<Age 
Then 



sup 



n 



i=l 



■ Op{n 



-i/2v 



sup 



i=l 



where D^ g = ^'^^g - B{h% q) and D*^ = ^'^^^ - B{h'^^g), with ^^^q as given 
in Lemma 4.1. 



Exactly as in Sections 2 and 3, one obtains functional central limit theo- 
rems for the von Mises statistics and empirical estimators for their asymp- 
totic covariance functions. 



5. Conditional distribution functions and quantiles of lag two. The con- 
ditional distribution function of the pair (Xn^i, Xn+2), given X„ = x, is 
defined by 

Fy,{t,u) =P(ei-hr^(x) < t, £2 r.^(x_i, ei r^(x)) <u) 
ft 



F{u - r^(x_i,y))/(y - r^(x)) dy. 
This can also be written as 

with qt^u{v,w) = l[v <t,w < u]. We estimate Fx(t,n) by 
F^{t,u) = / / qt,uig^iy,z))f{y)f{z)dydz 
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and its weighted version 



Here the class Q equals {qt,u :t,u£K}. It has envelope Vq = 1; thus, condi- 
tion (4.1) holds with 71 = 72 = and Cq = 1. We have 

hi^ljy) = F{u - r,(x_i, y + r,(x)))l[2/ < t - r,(x)], 

U^ljz)= f l[z<n-r,(x_i,y)]/(y-r,(x))dy. 

J —00 

We shall now show that 7Y* is Donsker if the class TZ has L2(-F)-bracketing 
numbers with polynomial growth: 

(5.1) A^[.](5, 7^, L2(F)) = 0{5~^) for some positive integer j. 
It is easy to check that 7i* d J^i + J^2-, where 

JTi = {F[u - a(-))l[- < w] : a G 7^; n, u G M}, 

^2 = ^f_ 1[- < u - a{y)]f{y) : a G 7^; t; G m|. 

Since subsets and sums of Donsker classes are Donsker, it suffices to show 
that Ti and T2 are Donsker classes. For this, it is enough to show the classes 
J-i and J-2 have L2(-^)-bracketing numbers with polynomial growth. For Ti, 
take brackets of the form [6*, 5*] = [F{u^ - a*(-))l[- < v^],F{u* - a*(-))l[- < 
v*]], where [a*, a*] is an (e/||/|| 00) -bracket for TZ\ v*, v* are chosen such that 
F{v*)-F{v^) < e^; and u^,, u* are chosen such that either u* — u^, < £^/||/||oo) 
or u^: = —00 and (i) / F'^{u* + Ati{1 + \y\))f{y)dy < e^, or u* = 00 and 
(ii) /(I - F{u^ - Ati{1 + \y\))ffiy)dy < e^. Then [K,b*] is a 3e-bracket 
for J^i. Since F has finite second moment, t'^F{t){l — F{t)) — > as \t\ — > 00. 
Using this, it is easy to see that u* in (i) can be chosen proportional to — 1/e 
and in (ii) can be taken proportional to 1/e. Thus, under (5.1), we can 
cover J^i with 0{e~^~'^) brackets of this form. 
For J^2, take brackets of the form 



[K,b* 



/Vf PV* 
l[z <u^- a*{y)]f{y)dy, / l[z <u* - a^{y)]f{y)dy 
-00 J —00 



where [a*, a*] is an e^/||/||oo-bracket for TZ; F{v*) — F(f*) < e^; and < 
V* are chosen such that either — < e^/||/||oo5 or v^: = —00 and (i) 
jFiv* + Anil + \y\))fiy) dy < or v* = 00 and (ii) /(I - F(t;, - A^il + 
|y|)))^/(y) ^ Then [5*, 6*] is a 3e-bracket for J^2- It is easy to check 
that, under (5.1), we can cover J^2 with 0{e~'^^~^) brackets of this form. 

This shows that condition (5.1) implies that H* is Donsker. In view of 
Remark 4.1, we then obtain that condition (4.8) is met. Using the moment 
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inequality and interchanging the order of integration, we can bound the 
left-hand side of condition (4.9) by 

sup / |F(M-r^(x_i,y)) - F{u - r^{x_i,y))\f{y - r^{x)) dy 



uGRJ 

< ll/lloo J \rr{x_i,y) - r^{x.i,y)\fiy - r^i:>c))dy. 

Thus, we have (4.9) in view of the Lebesgue dominated convergence theorem 
and the growth condition (4.2). Lemmas 4.1 and 4.2 now imply conditions 
(2.9) and (2.10) of Theorem 2.2. Finally, (2.11) is implied by the two condi- 
tions 

sup / F{u-rr{x_i,y + rr{x))){f{y-s)- f{y) + sf'{y))dy 

T.t,U J ~00 

(5.2) ^ 

= 0{s') 

and 
sup / 



(F(u-r^(x„i,y) -s) 



-F{u - r^(x_i, y)) + sf{u - r^(x„i, y)))/(y - rr{:>c))dy 



0{s% 



where the suprema extend over all real t and u and all r with |t — 19| < A. 
The latter condition is satisfied if / is Lipschitz. If we set a,- [y) = (x_i , y + 
rT-(x)) and use (4.13), we obtain that the integral in (5.2) can be written 

/ F{u- ar{y + ws))f'{y) dy - F{u- a,(y))/'(y) dy dw. 



If / is Lipschitz, so that /' is bounded, we see that condition (5.2) is implied 

by 

(5.3) sup /|,v(x_„. + ,,)-r.(x..,.)||/'(.-r.(x))M.= 0(,). 

If / has finite Fisher information, then /' is integrable and a sufficient con- 
dition for (5.3) is that there is a constant L such that 

(5.4) |r^(x_i,yi) -r^(x_i,y2)| < L|yi -y2|, yi, y2 G M; |r - i9| < A. 
Hence, Theorem 2.2 gives the following stochastic expansions for Fx and 

^ XUI • 
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Theorem 5.1. Let Assumptions B, F, K and R hold and let f be Lip- 
schitz. Suppose that (4.4), the growth conditions (4.2) and (5.1), and (5.3) 
or (5.4) hold. Then 



sup 



1 " 

Fx(t, u) - F^it, u)--J2 KA^i) - [DluVi^ - ^) 

1=1 



1/2V 



sup 

t,u&. 



FMt, u) - F^it, n) - - ^ /i* (£,) - [D* ]^(^ - ^) 



j=i 



0p{n 



where D^ ^ = ^ft^u — B{h^ .^) and Df^ = ^t,u — B{hf^) and where 

ht,uiy) = ^(^ - ^i?(x-i, y + r^(x)))l[?/ < t - r^(x)] 

rt 

+ 1 l[y <u- r^{x^i,z)]f{z - ri){x))dz; 



F{u - r^(x_i, y))/'(y - r.^((x)) dyr^{ji.) 
f{u - ri,(x_i,y))/(y - ri,(x))r^(x_i, y) dy. 



Example 5.1. Consider the AR(1) model, in which r^(x) = 'dx and 
I??! < 1. Clearly, Assumption R and conditions (4.2), (4.4) and (5.1) hold. 
Also, condition (5.4) holds with L = I. Thus, if / has finite Fisher informa- 
tion for location and is Lipschitz, then all the assumptions of Theorem 5.1 
can be met. 

Example 5.2. Consider the EXPAR(l) model, in which -i? = ('i?i,'i?2) 
with i?i < 1, and r^{x) = (■!?! + t?2 exp(— 7X^))3;. Here the exponent 7 is 
assumed known. The assumptions of Theorem 5.1 can be met if / has finite 
Fisher information for location and is Lipschitz. Clearly, Assumption R and 
conditions (4.2), (4.4) and (5.1) hold. Moreover, condition (5.4) is satisfied. 

We have phrased the conditions on r,- in Theorem 5.1 sufficiently general 
to cover discontinuous autoregression functions such as those appearing in 
SETAR models. 

Example 5.3. Consider the SETAR(2, 1, 1) model with known thresh- 
old ^. In this model, i? = (i9i,'i?2) with t?i < 1, i?2 < 1, '&ii!^2 < 1, and r^{x) = 
< + '&2xl[x > (,]■ It is easily seen that Assumption R and the con- 
ditions (4.2), (4.4) and (5.1) hold. Suppose now that / has finite Fisher 
information for location and is Lipschitz. If ^ = 0, then the Lipschitz condi- 
tion (5.4) holds. If 7^ 0, then the Lipschitz condition (5.4) does not hold, 
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but one has 

\rriy + s)- rriy)\ < i\n\ + |r2|)(|s| + l[\y - ^\ < \s\]). 
This and the fact that /' is bounded and integrable yield (5.3). 

The one-dimensional lag- two conditional distribution function Gx(^i) 
Fx(oo,ti) at u of Xn+2, given X„ = x, is 



G^{u) = J F(n-r^(x_i,y))/(y-r^(x))d?/. 

We estimate G^{u) by 

G^{u)= j j l[z + r^{x-i,y + r^{x))<u]f{y)f{z)dydz 

and its weighted version 

G^w{u) = j j l[z + r^{x_i,y + r^{x))<u]f^{y)fw{z)dydz. 

We obtain stochastic expansions as in Theorem 5.1, with t replaced by oo. 

Theorem 5.2. Let Assumptions B, F, K and R hold and let f be Lip- 
schitz. Suppose that (4.4), the growth conditions (4.2) and (5.1) and (5.3) 
hold. Then 



sup 



1 



G^u) - G^{u) --Y^hlisi) - m^i^-^) 



i=l 



sup 



Op(n-V2); 
Op(n-i/2), 



G^M - G^{u) h*{ei) - [D*]''{^ - ^) 

where Dl = ^u- B{hl) and D* = ^« - B{hf) and where 

hu{y) = F{u - r^{x_i,y + r^{x))) + J l[y < u - r^{x_i, z)]f{z - r^{x)) dz; 

^u = - I F{u-r^{x.i,y))f'{y-r^{x.))dy U{x) 

/(m- r^(x_i,y))/(y- r^(x))r^(x_i,y)(iy. 

Example 5.4. We apply Theorem 5.2 to the conditional quantile func- 
tion of lag two. The conditional u-quantile of X„_|_2, given X„ = x, is the 
left-inverse G~^{v) of G^ at v. We estimate it by G~^{v). Since / was as- 
sumed positive, Gx has a positive density 

gyciu)= / f{u-r^{x_i,y))f{y-r^{x))dy. 
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Let < c< d < 1. As in Example 3.1, we use Proposition 1 of [10] to obtain 
the stochastic expansion 



sup 

DG[c,(i] ■" 

It fohows that the smoothed and weighted lag-two conditional quantile pro- 
cess 

{n'/\G-Uv)-G-\v)):ve[c,d\} 
converges weakly in -^oodcjC?]) to a centered Gaussian process. 

Example 5.5. Consider the AR(1) model Xi = 'dXi^i+Si with < 1. 
Let us also take to be the sample correlation coefficient, which satisfies 
(3.2). We are interested in predicting the probability Gxiu) = P{Xn+2 < 
u\Xn = x), which can be expressed as 

G^{u) = P{e2 + i?ei + ^'^x<u)= J F{u -^y- ^^x)f{y) dy. 

We assume that / has finite Fisher information for location and is Lipschitz, 
so that the requirements of Theorem 5.1 and, hence, of Theorem 5.2 are met 
as demonstrated in Example 5.1. The smoothed von Mises estimator is 



G,iu)=F^{oo,u)= J J l[z + ^y + rx<u]f{y)fiz)dydz, 
and its weighted counterpart is 

Gxw{u) = Fyiw{oo,u) = / / l[z + -dy + -d'^x <u]fw{y)fy,{z)dydz. 



Since r^(x) = x and E[X] = 0, we see that B{g) = for all g S L2{F). Thus, 
we obtain from Theorem 5.2 and from expansion (3.2) for that 



n 1/2^ 



1 " / 1 - \ 

G^{u) = G^{u) + (huiei) - 2G^{u) + ^Xi^ie^ j +Op( 

1=1 

and 

c 1 " 

Gxw{u) =Gx{u) + Op(n"^/^), 

cr^ n 

1=1 

where = -E[{e + 2'&x)f{u --de- t?^a;)], Cu = E[ehu{e)] and 

F{u-'&e-^^x)+F{{u-e-T}'^x)/i9), i9 > 0, 

hu{e)=hoo,u{^) = { F{u) + l[e<u\, ^9 = 0, 

F(n - ??£ - ^^x) + 1 - F((m - e - i^^x)/i^), ^<0. 
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Fig. 1. The asymptotic relative efficiency t^/t^ of the unweighted versus the weighted 
estimator for u = for various values of and x. 

Consequently, n^^'^{Gx{u) — Gx{u)) is asymptotically normal with mean zero 
and variance = Var(/iu(e)) + ^^(1 — i?^), while n^/'^{Gxw{u) — Gx{u)) is 
asymptotically normal with mean zero and variance = t'^ — cl/a'^. There- 
fore, the weighted version has a smaller asymptotic variance unless Cu = 0. 
The variance reductions can be considerable. Figure 1 is a graph of the 
asymptotic relative efficiency t'^/t'^ of the unweighted with respect to the 
weighted estimator as a function of 'd (ranging from 0.05 to 0.95) and x 
(ranging from to 2) in the case of the standard normal density / and 
ti = 0. As one can see from the graph, the ratio is always below 0.3 and can 
be as small as 0.0151. Thus, variance reductions of over 98% are possible. 

6. Efficiency. In this section we prove that the weighted versions of our 
estimators are efficient. We recall that, among all "regular" estimators, an 
estimator for a vector-valued functional is efficient in the sense of Hajek and 
Le Cam if its standardized error is asymptotically maximally concentrated 
in symmetric convex sets. In a locally asymptotically normal model, an esti- 
mator for a differentiable functional is regular and efficient if and only if it is 
asymptotically linear with influence function equal to the canonical gradient 
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of the functional. For our nonlinear autoregressive model, these concepts and 
the explicit form of the characterization are given in [32] for differentiable 
functionals of both the autoregression parameter and the innovation density. 
Here we are interested in estimating the functional 

Kil),f) = ^Pih^J)= I h,iy)f[f{y,)dy, 

where {hj- :\t — 'd\ < A} is a class of measurable functions from into ffi. 
We assume that the class has an envelope H of the form H{yi, . . . ,ym) = 
V (yi) ■ ■ ■ V {ym) with V satisfying Assumption V. We use hr as defined in 
Section 2 and assume that 

(6.1) J{hr-h?dF^O asr^t?. 

Lemma 6.1. Suppose, in addition to the above, that T^ip{hT-,f) is 
differentiable at -d with gradient Let be a sequence in such that 
n^^'^i'&n — 19) — > n. Let fn be a sequence of densities such that ||/n — /||y2 — > 
and 

J {n''\f'JHy) - f"\y)) - Hy)f^l\y)f dy - 
for some v G L2{F). Then 

n^^\Hh^„Jn)-i'{h^,f))^^Ju + I UvdF. 

Proof. Express ip{h^^, fn) — tp{h^, f) as the sum T1 + T2 + T3 with 
Ti = ip{hi)^,fn) -'tlj{h^„J)- J h^„iy){fn{y) - fiy))dy, 

T2 = J UMUniy)- f{y))dy, 

T3 = i){h^^, f) - i;{h^, f). 

We have n^^'^T^ — > ^Ju. The argument given in the proof of Theorem 2.1 

shows that Ti = OiWfn - fWl). Writing s„ = fn^\ s = f^/^ and /„ - / = 
{sn — s) X [sn + s) , and applying the Cauchy-Schwarz inequality, we obtain 
that Wfn - ffy < 2(||/„||y2 + || / 1| y2 ) || (sn " sf\\i = ©(n"!). Thus, n^^Ti ^ 
0. Finally, n^/'^T2 J h^vdF by the same argument as for Lemma 7.2. □ 

As in Section 2, let V be the set of all v G L2{F) with / v{y)dF{y) = 
and / yv{y) dF{y) = 0. For each f G V, there is a sequence /„ = of 
zero mean densities as required in the previous lemma. As shown in [32], 
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these densities can be chosen to also satisfy ||(/„ — /)//||oo — > and to 
have finite Fisher information for location if / has it. We also require now 
that (3.1) in [32] hold: for every sequence -dn and fn = fnv as above, the 
corresponding stationary density converges in Li to the density of G. Under 
this assumption, one has local asymptotic normality. As seen in Section 2, 
under appropriate conditions, the estimator ip{h^,fw) has the stochastic 
expansion 

+ Op(n-V2), 

with /i = £'[r^(X)] and hf{y) = h^{y) — J h^dF — a~^y J uh^{u) dF[u). Re- 

— M- — 

call that is the projection of onto V. The projection of £ onto V is 
= ^(y) ~ <7~^y- if ^ is efficient, then, by characterization (3.12) of [32], 
it has the stochastic expansion 



1 " 

^ = ^? + A-i- V S(X,„i, £,) + Op(n"V2) 



where 5(X,e) = rtf(X)^(e) - ^it*{e) and A = E[5(X, e)5(X, e)^] = JR- 
J'^HIjJ, with J and J* the second moments of i{e) and ^*(e), and R = 
i?[r^(X)r^(X)~'']. If an efficient estimator ^ is used in '^(/i^,/^), we obtain 
the stochastic expansion 



1 " 



i=l 

with 



Hh^,L) = i^{hi},f) + ^ E %(X.-i, Ei) + Op(n-V2^ 



M = A^i (^.g-ii j h*£dF^ 



For V gV, we have 



^[%(X,e)v(e)] = J hfvdF + M^fi(^J IvdF- j i*vdF 



hfvdF 



h^vdF. 
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Furthermore, 



E[%(X,e)r^(X)£(e)] =^ J hfidF + {JR - J* fifi^)M 



This shows that, for all n G M*^ and f G V, 

E[S#{'X,e){u^r^(X)£{e)+v{e))]=u^'^./, + J h./,vdF. 

Since S'#(X,e) is of the form iS'^(X,e) = r^{X.)£{e) +vo{e) for some uq € 
and vo G V, we obtain that ^^(Xje) is the canonical gradient of the 
functional i/j{h,ff, f). Hence, il^{h^, fw) is efficient by the characterization (3.5) 
in [32], provided 5#(X,e) is almost surely not zero. 

The stochastic expansion of ip{h^,fuj) given above implies that n^^"^ x 
(ipih^j/w) — V'(^i9;/)) is asymptotically normal with variance 

+ {^^ - fiE[h4e)£*{e)])'^A{^^ - fiE[h4e)£*{e)]). 

7. Technical details. We begin with a characterization of compact sub- 
sets of L2{y) for a measure v with Lebesgue density. 

Lemma 7.1. Let u he a finite measure with Lebesgue density ip. Let 
W G L2{y) satisfy 

(7.1) j{W{x-s)-W{x)fu{dx)^Q ass^O. 

Then a subset Q 0/^2(2^) with envelope W is totally bounded if and only if 

(7.2) sup {g{x — s) — g{x))'^ ^{dx) ^ as s — > 0. 
gegJ 

Proof. Let A denote the Lebesgue measure. Let Q denote the closure 
of G in L2{i'). Clearly, Q is totally bounded if and only if Q is compact in 
^2(1^). The latter is equivalent to compactness of Q^/<f| in L2(A). By the 
Frechet-Kolmogorov theorem (see [45], page 275), compactness of Q^J^ is 
equivalent to 

(7.3) sup / g^ipdX < 00, 

g&gJ 
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(7.4) sup / {g{x — s)\Jip{x — s) — g{x)\J Lp{x) )^ dx ^ ass— >0, 



(7.5) sup/ g\x)^ix)dx^O asM^oo. 

geg J\x\>M 

Since Q has envelope W in L2{i^), properties (7.3) and (7.5) are automatically 
satisfied. Since 

{{g{x - s)^Jip{x-s) - g{x)^ ip[x) ) - {g{x - s) - g{x))^Lp{x)f 

= g'^{x- s){^ (/9(x -s)-^ f{x) f, 
properties (7.4) and (7.2) are equivalent if 

(7.6) / W^{x-s){'^ip{x-s)-^Jip(x)fdx^Q ass^O. 



The above identity with g = W, together with continuity of translation in 
L2(A), for which we refer to [30], Theorem 9.5, shows that (7.6) and (7.1) 
are equivalent. □ 

The next lemma discusses uniform differentiability of integrals with re- 
spect to Hellinger differentiable densities. 

Lemma 7.2. Let {p^ : |r — t?| < A} be a family of densities with respect 
to some measure v. Let Pr be Hellinger differentiable at •& with Hellinger 
derivative x- Let W be a nonnegative function such that 



(7.7) J W^prdu^ J W'^pi)dv 



as T ^ iD. 



Then 

(7.8) sup 

\9\<W 



giPr-P-d-X (T-^)pi))dv 



■■o{\t-^). 



Moreover, if {gr : |r — ■!?! < A} has envelope W and J {gr — g^Yp^ dv — > 0, 
then 

j 9t{Pt - Pi}) du 

(7.9) 

gi)x'^p^du {T-'d) + o{\T-{}\). 



Proof. Hellinger differentiability implies that Pr — > P{} in //-measure. 
This and (7.7) yield / W'^\pr — p^\ dv 0. Let Sr = pV^ and rr = Sr — — 
\x^ {t — 'd)s^. Hellinger differentiability means that f r'^dv = o(|r — "i^P). 
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Since Pr - P-& - X^i^ - ^)Pi} = rrisr + Si?) + ^X^ {t - 'd)s^{sr - s^), an ap- 
plication of the Cauchy-Schwarz inequality shows that the square of the 
left-hand side of (7.8) can be bounded by 

2 J W^{sr + s^fdi^ J rldv 

+ hj{x'{r- ^)fp^ du J W\sr - s^f dv. 

Using (sr + s^)^ < 2{pr +p^) and {sr — s^)^ < \Pt —pA-: obtain (7.8). To 
prove (7.9), it therefore remains to show that / {g-r — gi))xPi) du ^ as r ^ 
{}. But this is an easy consequence of JiOr — 9i))'^P-a du —>-0. □ 
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